transformer-based neural network
Differential Evolution Algorithm based Hyper-Parameters Selection of Transformer Neural Network Model for Load Forecasting
Sen, Anuvab, Mazumder, Arul Rhik, Sen, Udayon
Accurate load forecasting plays a vital role in numerous sectors, but accurately capturing the complex dynamics of dynamic power systems remains a challenge for traditional statistical models. For these reasons, time-series models (ARIMA) and deep-learning models (ANN, LSTM, GRU, etc.) are commonly deployed and often experience higher success. In this paper, we analyze the efficacy of the recently developed Transformer-based Neural Network model in Load forecasting. Transformer models have the potential to improve Load forecasting because of their ability to learn long-range dependencies derived from their Attention Mechanism. We apply several metaheuristics namely Differential Evolution to find the optimal hyperparameters of the Transformer-based Neural Network to produce accurate forecasts. Differential Evolution provides scalable, robust, global solutions to non-differentiable, multi-objective, or constrained optimization problems. Our work compares the proposed Transformer based Neural Network model integrated with different metaheuristic algorithms by their performance in Load forecasting based on numerical metrics such as Mean Squared Error (MSE) and Mean Absolute Percentage Error (MAPE). Our findings demonstrate the potential of metaheuristic-enhanced Transformer-based Neural Network models in Load forecasting accuracy and provide optimal hyperparameters for each model.
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.25)
- Asia > India (0.05)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- (2 more...)
Tabdoor: Backdoor Vulnerabilities in Transformer-based Neural Networks for Tabular Data
Pleiter, Bart, Tajalli, Behrad, Koffas, Stefanos, Abad, Gorka, Xu, Jing, Larson, Martha, Picek, Stjepan
Deep Neural Networks (DNNs) have shown great promise in various domains. Alongside these developments, vulnerabilities associated with DNN training, such as backdoor attacks, are a significant concern. These attacks involve the subtle insertion of triggers during model training, allowing for manipulated predictions.More recently, DNNs for tabular data have gained increasing attention due to the rise of transformer models. Our research presents a comprehensive analysis of backdoor attacks on tabular data using DNNs, particularly focusing on transformers. Given the inherent complexities of tabular data, we explore the challenges of embedding backdoors. Through systematic experimentation across benchmark datasets, we uncover that transformer-based DNNs for tabular data are highly susceptible to backdoor attacks, even with minimal feature value alterations. We also verify that our attack can be generalized to other models, like XGBoost and DeepFM. Our results indicate nearly perfect attack success rates (approximately 100%) by introducing novel backdoor attack strategies to tabular data. Furthermore, we evaluate several defenses against these attacks, identifying Spectral Signatures as the most effective one. Our findings highlight the urgency of addressing such vulnerabilities and provide insights into potential countermeasures for securing DNN models against backdoors in tabular data.
- North America > United States > District of Columbia > Washington (0.05)
- Europe > Netherlands > South Holland > Delft (0.04)
- Europe > Netherlands > Gelderland > Nijmegen (0.04)
- (3 more...)
Learning to Decode the Surface Code with a Recurrent, Transformer-Based Neural Network
Bausch, Johannes, Senior, Andrew W, Heras, Francisco J H, Edlich, Thomas, Davies, Alex, Newman, Michael, Jones, Cody, Satzinger, Kevin, Niu, Murphy Yuezhen, Blackwell, Sam, Holland, George, Kafri, Dvir, Atalaya, Juan, Gidney, Craig, Hassabis, Demis, Boixo, Sergio, Neven, Hartmut, Kohli, Pushmeet
Quantum error-correction is a prerequisite for reliable quantum computation. Towards this goal, we present a recurrent, transformer-based neural network which learns to decode the surface code, the leading quantum error-correction code. Our decoder outperforms state-of-the-art algorithmic decoders on real-world data from Google's Sycamore quantum processor for distance 3 and 5 surface codes. On distances up to 11, the decoder maintains its advantage on simulated data with realistic noise including cross-talk, leakage, and analog readout signals, and sustains its accuracy far beyond the 25 cycles it was trained on. Our work illustrates the ability of machine learning to go beyond human-designed algorithms by learning from data directly, highlighting machine learning as a strong contender for decoding in quantum computers.
Exploring the Model Behind ChatGPT: How the Bot Works
ChatGPT is a powerful language model developed by OpenAI, based on the GPT-3.5 architecture. ChatGPT is used in a variety of applications, including chatbots, virtual assistants, and content creation. In this article, we'll take a closer look at the model behind ChatGPT and explore how the bot works. ChatGPT is a deep learning model that uses natural language processing (NLP) to generate text-based responses to user input. It's based on the GPT-3.5 architecture, which is an improvement over the previous GPT-3 architecture. The GPT-3.5 architecture is a transformer-based neural network that uses self-attention mechanisms to process input sequences.